Corelab Seminar
2013-2014
Loukas Kavouras (NTUA)
k means clustering
Abstract.
k-means clustering is a method of vector quantization originally from signal processing, that is popular for cluster analysis in data mining.
k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as prototype
of the cluster. I will present you Lloyd's algorithm which is a very simple and fast
algorithm for practical applications but has no approximation guarantees. By improving the initialization of the algorithm we get a faster algorithm with
approximation guarantees,named k-means++.Next, i will show a way to parallelize the algorithm making it much more efficient for large data.
Last, i will show 2 algorithms in the streaming model.